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Cancer is one of the most important health problems that threat the human life. The likelihood of curing cancer 
increases with its early diagnosis and correct grading, for which his to pathological examination is routinely used. 
The developed novel model uses both structural and statistical pattern recognition techniques to locate and characterize the 
biological structures in a tissue image for tissue quantification. This approach mainly includes three steps. They are graph 
generation for tissue images and query glands, localization of key regions, and feature extraction from the key regions. 
Unlike conventional approaches, this model quantifies the located key regions with structural and textural features 
extracted from the images. Then based on the extracted key features it classifies the images into two groups low and high 
grade with the help of SVM (Support Vector Machine) classifiers. The developed model leads to higher classification 
accuracies, compared against the conventional approaches that use only statistical techniques for tissue quantification. 

KEYWORDS: Histopathological Examination, Pattern Recognition, Graph Generation, Key Features and Support 
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Cancer is a class of diseases characterized by out-of-control cell growth. There are over 100 different types of 
cancer, and each is classified by the type of cell that is initially affected. The likelihood of curing cancer increases with its 
early diagnosis and correct grading, for which his to pathological examination is routinely used. The number of 
computational studies on his to pathological image analysis is increasing over the past few years. The main aim of these 
studies is to automate the diagnosis and grading process for reducing the subjectivity that can be observed in his to 
pathological examination. These studies extract features from a his to pathological tissue image and use the features in 
automated diagnosis and grading. 

Digital pathology provides a digital environment for the management and interpretation of pathology information 
that is enabled by digital slides (virtual slides). The implementation of these systems typically requires a deep analysis of 
biological deformations from a normal to a cancerous tissue as well as the development of accurate models that quantify 
the deformations. These de-formations are typically observed in the distribution of the cells from which cancer originates, 
and thus, in the biological structures that are formed of these cells. For example, colon adenocarcinoma, which accounts 
for 90%-95% of all colorectal cancers, originates from epithelial cells and leads to deformations in the morphology and 
composition of gland structures formed of the epithelial cells (Figure 1). Moreover, the degree of the deformations in these 
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structures is an indicator of the cancer malignancy (grade). Thus, the correct identification of the deformations and their 
accurate quantification are quite critical for precise modeling of cancer. 

Although digital pathology systems are implemented for different purposes, including segmentation, and retrieval, 
most of the research efforts have been dedicated to tissue image classification. Compared to traditional pathology, major 
advantages of digital pathology are that slides can be viewed via computer monitors, archived and retrieved easily and 
most importantly analysed using software algorithms rather than by manual analysis. Virtual microscopy is the technique 
for creating (via scanners) and viewing (via software) whole-slide images. 




Figure 1: Colon Adenocarcinoma Changes the Morphology and Composition of Colon Glands 

Pattern recognition is a field with in the area of machine learning, and it aims to classify data (patterns) based on 
either a priori knowledge or on statistical information extracted from the patterns. Most of the pattern recognition methods 
exist make use of procedures and algorithms. 

The classification or description scheme usually uses one of the following approaches: statistical 
(or decision theoretic) or syntactic (or structural). Statistical pattern recognition is based on statistical characterizations of 
patterns, assuming that the patterns are generated by a probabilistic system and it is represented by d -features and 
attributes and viewed as a d-dimensional vector. Structural pattern recognition is based on the structural interrelationships 
of features recognition and represented as a symbolic data structures, such as strings, trees, or graphs. A wide range of 
algorithms can be applied for pattern recognition, from very simple Bayesian classifiers to much more powerful neural 
networks. 

METHODOLOGY 

It is a new approach to tissue image classification. This approach models a tissue image by constructing an 
attributed graph on its tissue components and describes what a normal gland is by defining a set of smaller query graphs. It 
searches the query graphs, which correspond to non deformed normal glands, over the entire tissue graph to locate the 
attributed sub graphs that are most likely to belong to a normal gland structure. Features are then extracted on these sub 
graphs to quantify tissue deformations, and hence, to classify the tissue. This approach includes three steps: graph 
generation for tissue images and query glands, localization of key regions (attributed sub graphs) that are likely to be a 
gland, and feature extraction from the key regions. The figure 2 shows the block diagram of entire system. 

• Tissue Graph Generation 

Graphs are a general and powerful data structure for the representation of objects and concepts. In a graph 
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representation, the nodes typically represent objects or parts of objects, while the edges describe relations between objects 
or object parts. This model describes tissue image with an attributed graph G={V,E, u} where V is a set of nodes, 
E £ VxV is a set of edges, and ji : V -> A is a mapping function that maps each node V; e V into an attributed node label 
d; e A. This graph representation relies on locating the tissue components in the image, identifying them as the graph nodes, 
and as-signing the graph edges between these nodes based on their spatial distribution. 
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Figure 2: Block Diagram of Tissue Image Classification System 

However, as the exact localization of the components emerges a difficult segmentation problem, use an 
approximation that defines circular objects to represent the components. 

In order to define these objects, first quantify the image pixels into two groups: nucleus pixels and non-nucleus 
pixels. For that, separate the hematoxylin stain using the deconvolution method proposed in and threshold it with the 
Otsu's method. Then, on each group of the pixels, locate a set of circular objects using the circle-fit algorithm. 
This approximation gives us two groups of objects: one group defined on the nucleus pixels and the other defined on the 
non - nucleus (whiter) pixels. These groups are herein referred to as "nucleus" and "white" objects. After defining the 
objects as the graph nodes, encode their spatial relations by constructing a tissue graph using Delaunay triangulation. 
For an example image given in Figure 3(a), the constructed tissue graph is illustrated in Figure 3(b) with the centroids of 
the nucleus and white objects (nodes) being shown as black and white circles, respectively. 

A normal gland is formed of a lumen surrounded by monolayer epithelial cells. The cytoplasms of normal 
epithelial cells are rich in mucin, which gives them their white-like appearance. Thus, in the ideal case, a query graph 
consists of many white objects at its center surrounded by a single layer nucleus objects. 



Note that there may exist deviations from this ideal case due to noise and artifacts in an image as well as model 
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approximations, as seen in Figure 3(c). Sub graphs generated from normal tissues show an object distribution similar to 
that of a query graph. On the other hand, the object distribution of cancerous tissue sub graphs becomes different since 
colon adenocarcinoma causes deviations in the distribution of epithelial cells and changes the white-look appearance of 
their cytoplasms (epithelial cells become poor in mucin). The graph edit distance features will be used to quantify this 
difference. 

Query Graph Generation 

Query graphs are the sub graphs that correspond to normal gland structures in an image. To define a query graph 
G s on the tissue graph G of a given image, select a seed node (object) and expand it on the tissue graph G using the breadth 
first search (BFS) algorithm until a particular depth is reached. 

In graph theory, breadth-first search (BFS) is a strategy for searching in a graph when search is limited to 
essentially two operations: (a) visit and inspect a node of a graph; (b) gain access to visit the nodes that neighbor the 
currently visited node. The BFS begins at a root node and inspects all the neighboring nodes. Then for each of those 
neighbor nodes in turn, it inspects their neighbor nodes which were unvisited, and so on. Compare BFS with the 
equivalent, but more memory efficient Iterative deepening depth-first searches and contrast with depth-first search. 

Then, take the visited nodes and the edges between these nodes to generate the query graph G. In this procedure, 
the seed node and the depth are manually selected, considering the corresponding gland structure in the image. Figure 3(c) 
shows this query graph generation on an example image; here black and white indicate the selected nodes and edges 
whereas gray indicates the unselected ones. 

The Figure 3 shows that (a) is the example normal tissue image, (b) is the tissue graph generated for this image, 
and (c) is a query graph generated to represent a normal gland. The node labels are indicated using four different 
representations and the orders in which the nodes are expanded are given inside their corresponding objects. 

Subsequently, the mapping function |j attributes each selected node with a label according to its object type and 
the order in which this node is expanded by the BFS algorithm. In particular, define four labels: a n . in and a 0>in for the 
nucleus and white objects whose expansion order is less than the BFS depth and a n . ollt and a n . om for the nucleus and white 
objects whose expansion order is equal to the BFS depth. 




(a) (b) (c) 

Figure 3: An Illustration of the Graph Generation Step (a) An Example Normal Tissue Image (b) The Tissue Graph 
Generated for this Image (c) A Query Graph Generated to Represent a Normal Gland 
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The query graph generation and labeling processes are illustrated in Figure 4. In this figure, a query graph is 
generated by taking the dash bordered white object as the seed node and selecting the depth as 4. This illustration uses a 
different representation for the nodes of a different label: it uses black circles for a n .; n white circles for a(B_j n , black circles 
with green borders for a n . out , and white circles with red borders for Oco-out- It also indicates the expansion order of the 
selected nodes inside their corresponding circles; note that the order is not indicated for the unselected nodes. The search 
process for key region localization uses the same algorithm to obtain sub graphs to which a query graph is compared. 
However, these sub graphs are generated by taking each object as the seed node and selecting the depth as the same with 
that of the query graph. Thus, the search process involves no manual selection. The search process is detailed in the key 
region localization process. 

Key Region Localization 

The localization of key regions in an image includes a search process. This process compares each query graph G 
with sub graphs G t generated from the tissue graph G of the image and locates the ones that are the N- most similar to this 
query graph. 




Figure 4: An Illustration of Generating a Query Graph 

The regions corresponding to the located sub graphs are then considered as the key regions. Since a query graph is 
generated as to represent a normal gland, the located sub graphs are expected to correspond to the regions that have the 
highest probability of belonging to a normal gland. 

Typically, the sub graphs located on a normal tissue image are more similar to the query graph than those located 
on a cancerous tissue image. Thus, the similarity levels of the located sub graphs together with the features extracted from 
their corresponding key regions are used to classify the tissue image. 

The search process requires inexact graph matching between the query graph and the sub graphs, which is known 
to be an NP-complete problem. Thus, use an approximation together with heuristics on the sub graph definition to reduce 
the complexity due to polynomial time. 

Query Graph Search 

Let G s ={ V s ,E s u s } be a query graph and v s e V s be its seed node from which all the nodes in V s are expanded using 
the BFS algorithm until the graph depth d s is reached. In order to search this query over the entire tissue graph G={ V,E, 
first enumerate candidate sub graphs G t ={ V t ,E, p } from the graph G. 
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Figure 5: The Query Graphs Generated as a Reference for a Normal Gland Structure and the Sub Graphs Located 
in Example Normal, Low-Grade Cancerous and High-Grade Cancerous Tissue Images 

For that follow a procedure similar to the one that used to generate the query. Particularly, take each nodevie 

V that has the same label with v s as a seed node and expand this node using the BFS algorithm until the query depth d s Note 

that the use of the same BFS algorithm with the same type of the seed node and the same graph depth, which is used in 

generating the query graph G s , prunes many possible candidates, and thus, yields a smaller candidate set. The nodes of the 

candidate sub graphs are also attributed with the labels in A={ a n . in a^.jn a n . out a a _ oM } using the mapping function |j, which 

was used to label the nodes of the query graphs. 

In the Figure 5 first image of each row shows the query graph on the image from which it is taken where as the 
remaining ones show the three-most similar sub graphs to the corresponding query graph. In the Figure 5, the sub graphs of 
the same image are shown with different colors (red for the most similar sub graph, green for the second-most similar sub 
graph, and blue for the third-most similar sub graph). After they are obtained, each of the candidate sub graphs G t is 
compared with the query graph G using the graph edit distance metric and the most similar N non over lapping sub graphs 
are selected. To this end, start the selection with the most similar sub graph and eliminate other candidates if their seed 
node is an element of the selected sub graph. 

Then repeat this process N times until the N-most similar sub graphs are selected. For different query graphs, 
Figure 5 presents the selected sub graphs in example tissue images; here only three-most similar sub graphs are shown 
Note that although there may not exist N .normal gland structures in an image, our algorithm locates the N-most similar 
sub-graphs, some of which may correspond to either more deformed gland structures or false glands. In this model do not 
eliminate these glands (sub graps) since the edit graph distance between the query graphs and the sub graphs of more 
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deformed glands are expected to be higher and this will be an important feature to differentiate normal and cancerous tissue 
images. 

Indeed, our experiments reveal that this feature is especially important in the correct classification of high- grade 
cancerous tissues since sub graphs generated from these tissues are expected to look less similar to a query graph, leading 
to higher graph edit distances. These higher distances might be effective in defining more distinctive features. Also note 
that sometimes there may exist N normal gland structures in an image but the algorithm may incorrectly locate sub graphs 
that correspond to non gland benign tissue regions. 

Graph Edit Distance Calculation 

To select the sub graphs G t ={V t ,E t u} that are most similar to a query graph G s ={V s ,E s u} ,the proposed model 
uses the graph edit distance algorithm, which gives error-tolerant graph matching. The edit graph distance quantifies the 
dissimilarity between a source graph G s and a target graph G t by calculating the minimum cost of edit operations the 
dissimilarity between a source graph G s to transform it into G t . This algorithm defines three operations: insertion (e ->v t ) 
that inserts a target node into G Si delection (v s -»e) that deletes a source node from G s and substitution ( v s ->v t ) that changes 
the label of a source node in G s to that of a target node in G t . Note that these operations allow matching different sized 
graphs G s and G t with each other. As illustrated in figure 5, the proposed graph representations together with this graph edit 
distance algorithm make it possible to match the query gland regions with the regions of different sizes and orientations. 

Let (ei , . . . a > . . . , e n ) e P denote a sequence of operations e ; that transforms G s into G t 

The graph edit distance dist (G s ,G t ) is then defined as, 



distG s ,Gf= min. 




Of- -© Graph Edit Distance = 16 

(d) (e) 
Figure 6: An Illustration of Minimum Distance Calculation and Graph Matching Process 
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Where cos(et) is the cost of the operation e,. Since finding the optimal sequence requires an exponential number 
of trials with the number of G s and G t . This algorithm decomposes the graphs G and G t into a set of sub graphs each of 
which contains a node in the graph and its immediate neighbors. Then, the algorithm transforms the problem of graph 
matching into an assignment problem between the sub graphs of G and G t and solves it using the Munkres algorithm. 

Then shortest path can be obtained by using dijkstra shortest path algorithm. 

Feature Extraction and Classification 

Feature extraction is the transformation of the original data (using all variables) to a dataset with a reduced 
number of variables. In feature extraction, all available variables are used and the data are transformed (using a linear or 
nonlinear transformation) to a reduced dimension space. First characterize a tissue image I by extracting two types of local 
features and classify the image using a linear kernel support vector machine (SVM) classifier. The first type is used to 
quantify the structural tissue deformations observed in the image .To quantify them, graph matching's used. However, as a 
standard SVM classifier does not work with the graphs, we embed the graph edit distances of the matching's in a feature 
vector D. 

To do that, for each query graph G s , calculate the average of the be a set of the N-most similar sub graphs of the 
image I. Let Gs= {Gst}be a set of the N-more similar sub graph for the query graph G . The average graphs edit distance 
d s for this query graph is, 

cL. = -£f =1 dist(Gs,Gst) (1) 

Where dist ( Gs, Gst)is the graph edit distance between the query graph G s the sub graph G st .Then ,the structural 
tissue deformations in the image I are characterized by defining the feature vector D=[ dj d s d s ] . he second feature type 
is used to quantify textural changes observed in the key regions. In our model; we focus on the outer parts of the key 
regions. The motivation behind this is the fact that changes caused by colon adenocarcinoma are typically observed in 
epithelial cells, which are lined up at gland boundaries. 

To extract the second type of features, locate a window on the outer nodes of the sub graphs and extract four 
simple features on the window pixels that are quantized into three colors using k-means. The first three features are the 
histogram ratios of the quantized pixels and the last one is a texture descriptor (J-value) that quantifies their uniformity. 
Note that the three colors correspond to white, pink, and purple, which are the dominant colors in a tissue stained with 
hematoxylin-and-eosin. 

RESULTS AND DICUSSIONS 

In this model the structural features are represented by graphs and textural features represented by key features as 
d-dimensional vectors. Mainly the developed model uses both structural and textural feature to classify the images into two 
groups, low and high graded tissue images based on various features extracted from the images. The features extracted 
from the images are contrast, energy, homogeneity, correlation. Then the image classification is mainly performed with the 
help of SVM classifiers. If the SVM classifier output is classes zero means it is a low graded image otherwise it is 
classified as a high graded tissue image. From classification results the low graded images are seems to be a normal images 
and high graded images are deviate from normal images. 
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The accuracy of the image also calculated based on True Positive (TP), True Negative (TN), False Positive (FP), 
and False Negative values. The above values are defined based on confusion matrix .The simulation result can be obtained 
with the help of image processing tool. The table 1 and 2 shows that the feature and accuracy value comparisons. 

CONCLUSIONS 

The developed novel model that makes use of both structural and statistical pattern recognition techniques for 
tissue image classification. This model represent a tissue image as an attributed graph of its components and characterize 
the image with the properties of its key regions. The main contribution of this work is on the localization and 
characterization of the key regions. The proposed model uses inexact graph matching to locate the key regions. 

To this end, it defines a set of query graphs as a reference to a normal gland structure and specifies the key regions 
as the sub graphs of the entire tissue graph that are structurally most similar to the quer graphs. Then, our model 
characterizes the key regions using the graph edit distances between the query graphs and their most similar sub graphs as 
well as extracting textural features from the outer parts of the key regions. 

Then the classification is performed with the help of SVM classifiers. The proposed model provide improved 
classification accuracies compare to conventional classification methods, that uses only statistical pattern recognition for 
tissue image classification. 



Table 1: Feature Value Comparison 



Extracted Feature Values 


Low Grade 


High Grade 


Contrast 


0.8593 


0.4672 


Correlation 


0.8866 


0.84430 


Energy 


0.0818 


0.8861 


Homogensity 


0.1332 


0.8256 



Table 2: Accuracy Comparison 



Output Image 


Sensitivity 


Specificity 


Accuracy 


Low grade 


80 


50 


75 


High grade 


90 


50 


83 



FUTURE WORK 

Let us hope for further contribution can be implemented with feed forward neural network classifier, to improve 
the speed of operation and also improve number of classification range. 
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